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Abstract: To ensure equal access to high quality education, the global expansion of universal 
basic education has included accountability measures in the form of academic tests. Presently 
the majority of countries participate in national testing; however, the past two decades have seen 
a substantial shift in test characteristics and aims. This article investigates the global 
transformation toward testing for accountability, where intentional or unintentional positive or 
negative consequences are applied to educators (teachers and administrators) based on their 
student’s test scores, in light of the emerging global culture, identified by World Culture 
theorists. Elements of the world culture — including the expansion of western education models, 
an emphasis on academic intelligence, faith in science as a rational path to truth, and the 
decentralization of authority to the local level - justify the establishment of testing for 
accountability systems. Descriptive evidence from regional and international datasets, such as 
PISA, PIRLS, and TIMSS, illustrate the speed at which this transformation occurs. The 
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convergence of countries toward testing for accountability and its position as an increasingly 
normative policy lever is illustrated in brief vignettes from the diverse systems of Hungary, 
Mexico, and South Korea. As testing for accountability becomes embedded in the world culture 
as a legitimate tool for education reform it is less prone to critical reflection. If the potential 
benefits and concerns of testing for accountability, outlined in this article, are not thoughtfully 
evaluated this global transformation will lead to a testing culture that is internalized as normative 
and adopted as individual values. 

Keywords: accountability; testing; education policy; New Right; globalization; World Culture 

La transformacion global hacia los examenes de rendicion de cuentas. 

Resumen: Para garantizar la igualdad de acceso a una educacion de alta calidad, la expansion global 
de la educacion basica universal ha incluido medidas de rendicion de cuentas en forma de examenes 
academicos. En la actualidad la mayoria de los paises participan en las examenes nacionales; Sin 
embargo, las ultimas dos decadas se ha producido un cambio sustancial en las caracteristicas y los 
objetivos de los examenes. En este articulo se investiga la transformacion global hacia examenes de 
rendicion de cuentas, donde se aplican las consecuencias positivas o negativas intencionales o no- 
intencionales a educadores (maestros y administradores) sobre la base de los resultados de las 
pruebas de sus estudiantes, a la luz de una cultura global emergente, identificada por los teoricos de 
la cultura mundial. Elementos de la cultura del mundo - incluyendo la expansion de los modelos de 
educacion occidentales, un enfasis en la inteligencia academica, la fe en la ciencia como un camino 
racional a la verdad, y la descentralizacion de la autoridad a nivel local - justifican el establecimiento 
de examenes de sistemas de rendicion de cuentas. Evidencia descriptiva de conjuntos de datos 
regionales e internacionales, como PISA, PIRLS, y TIMSS ilustran la velocidad a la que se produce 
esta transformacion. La convergencia de los paises hacia examenes de rendicion de cuentas y su 
position como una palanca politica cada vez mas normativa se ilustra en breves vinetas de los 
diversos sistemas de Hungria, Mexico y Corea del Sur. Como los examenes de rendicion de cuentas 
se incrustan en la cultura mundial como una herramienta legitima para la reforma de la educacion 
esos examenes son menos propensos a reflexiones criticas. Si los beneficios y las preocupaciones de 
los examenes de rendicion de cuentas potenciales, descritos en este articulo, no se evaluan 
cuidadosamente esta transformacion global conducira a una cultura de examenes que se internaliza 
como normativa y adoptada como un valor individual. 

Palabras clave: la rendicion de cuentas; las pruebas; politica educativa; Nueva Derecha; la 
globalization; Cultura Mundial. 

A mudan§a global para exames de responsabiliza§ao. 

Resumo: Para garantir a igualdade de acesso a educa^ao de qualidade, a expansao global da 
educa^ao basica universal incluiu medidas de responsabiliza^ao como exames academicos. 
Atualmente, a maioria dos paises que participant nos exames nacionais; No entanto, nas ultimas duas 
decadas, houve uma mudan^a substancial nas caracteristicas e objetivos dos testes. Este artigo 
analisa a transforma^ao global para a presta^ao de contas, onde as consequencias nao intencionais 
positivos ou negativos ou intencionais nao-educadores (professores e administradores) com base nos 
resultados dos ensaios de seus alunos aplicam, investiga a luz de uma cultura global emergente, 
identificado por teoricos da cultura mundial. Elementos da cultura em todo o mundo - incluindo a 
expansao dos modelos ocidentais de educa^ao, a enfase na inteligencia academica, a fe na ciencia 
como uma forma racional para a verdade, ea descentraliza^ao da autoridade para o nivel local - 
justificam o estabelecimento exame dos sistemas de presta^ao de contas. Prova descritiva dos dados 
regionais e internacionais comuns, como o PISA, PIRLS e TIMSS ilustrar a velocidade com que essa 
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transforma^ao ocorre. A convergencia de paises para os testes de presta^ao de contas e sua posi^ao 
como uma alavanca polltica cada vez mais regras ilustradas em breves vinhetas dos varios sistemas 
de Hungria, Mexico e Coreia do Sul. Como os testes de presta^ao de contas incorporado na cultura 
mundial como uma ferramenta legitima para a reforma da educa^ao esses testes sao menos 
propensas a reflexoes criticas. Se os beneficios e as preocupa^oes dos exames de presta^ao de contas 
de contas potenciais descritas neste artigo nao for cuidadosamente avaliar essa transforma^ao global 
vai levar a uma cultura de testes que e internalizado como normativo e adotada como um unico 
valor. 

Palavras-chave: presta^ao de contas; exames; polltica de educa^ao; Nova Direita; globaliza^ao; 
Cultura Mundial. 


Introduction 

With the importance of education increasing globally, international debate has turned from 
providing educational access to ensuring efficiency and equity in educational outcomes. To certify 
the available education is of high quality, the global expansion of universal basic education has 
included accountability measures in the form of academic tests. Understanding the transformation 
from traditional high-stakes exams that place responsibility of test scores on the student to testing 
for accountability, which places intentional or unintentional positive or negative consequences on 
educators (teachers and administrators) for their student’s performance, is important because 
different approaches to testing are likely to lead to variant student outcomes (Harris & Herrington, 
2006). This article investigates this global transformation through an exploration of the components 
of the emerging global culture, as identified by World Culture theorists. World Culture theory is 
often criticized by both proponents and opponents as merely a descriptive theory that fails to 
consider the potential outcomes of the emerging world culture (Carney, Rappleye, & Silova, 2012; 
Schofer et al., 2012), this article explains how the self-proclaimed components of world culture 
support and perpetuate the movement toward testing for accountability. 

The article starts by using data from international assessments to illustrate the rapid 
expansion of testing and contrast the current turn toward testing for accountability with more 
traditional understandings of high-stakes examinations. This is followed by a historical look at the 
early adopters of testing for accountability, the United States and the United Kingdom, and the role 
of New Right ideology in their testing reform. Sections four and five situates the testing for 
accountability trend within the contemporary “global educational reform movement” (see Sahlberg, 
2010, p. 47), introduces World Culture theory, and explains how multiple elements of the emerging 
world culture legitimate testing for accountability systems. To understand the shift toward greater 
accountability, national testing policy categories are outlined in section six and illustrative national 
examples from Hungary, Mexico, and South Korea are provided in section seven. Finally, the 
concluding section asks whether this global transformation is moving the world toward a normative 
testing culture that has the potential to influence multiple facets of society. 

Expansion of Testing 

Testing has long been used to assess student understanding, inform instmction, and identify 
students for academic advancement. However, the latter half of the 20 th century signified a shift in 
the type of test administered, illustrated by a sharp rise in the use of large-scale standardized tests. In 
investigating the educational systems of 21 industrialized countries between 1974 and 1999, Phelps 
(2000) found that 18 increased the number of annually administered large scale tests, leading him to 



Education Policy Analysis Archives Vol.22No. 116 


4 


conclude that there is a “clear trend towards adding, not dropping testing programs” (p. 19). Since 
1980 nearly all European countries have adopted national testing policies (Eurydice, 2009a). 
Additionally, this trend is not limited to the industrialized north as educational reformers around the 
globe insist that “improving national (or state) testing systems is an important, perhaps the key, 
strategy for improving educational quality” (Chapman & Snyder, 2000, p. 457). The acceleration in 
national test policy adoption is perhaps best illustrated in the work of Benavot and Tanner (2007). 
They found that between the years 1995 and 2006 the number of countries worldwide that 
participate in an annual national testing program more than doubled from 28 to 67. As of 2006, 81% 
of developed countries and 51% of developing countries have conducted at least one national test. 

Concurrent with the rise in national testing programs is increasing participation in cross¬ 
national assessments. The first cross-national assessments were initiated in the 1960s and were 
originally regionally focused with participation solely from industrialized countries. For example, 
twelve countries participated in the First International Mathematics Study in 1964 with only Japan 
and the United States located outside of Europe. However, since the 1990s international 
assessments have included a diverse array of countries outside the industrialized world as well as 
provincial economies, such as Shanghai, China and Dubai, United Arab Emirates. As illustrated in 
Figure 1, this has resulted in a steady increase in the number of participants in the three largest 
international studies: Trends in International Mathematics and Science Study (TIMSS), Program for 
International Student Assessment (PISA), and Progress in International Reading Literacy Study 
(PIRLS). All studies show a roughly 50% increase in participation between 1995 and 2012. 



Figure 1. National participation in select international student assessments 


With the support of international agencies the late 1990s also saw the creation of regional 
assessments in developing regions. For example the Southern and Eastern African Consortium for 
Monitoring Educational Quality (SACMEQ) completed their first round of data collection in seven 
countries in 1999. Since the initial assessment of reading literacy among sixth graders, the number of 
participant countries has more than doubled, with 16 countries currently partaking in SACMEQ IV, 
scheduled for completion in 2014. In Latin America, participation in the Latin America Laboratory 
for Assessment of the Quality of Education (LLECE) has also increased, although at a slower rate, 
from 13 countries in 1997 to 17 countries in 2006. 
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At the school or classroom level a shift is also evident in the types of tests students 
complete, moving away from strictly teacher-administered tests to large scale standardized tests. 
Using data from PISA questionnaires, figure 2 illustrates that the percentage of schools that 
participate in more than two standardized tests a year have increased from 13.7% to 29.9% over just 
a ten year span. The number of standardized tests students take varies greatly across countries with 
many students exposed to a substantial amount of standardized tests over their compulsory school 
career. For example in Denmark students take 36 national tests during their time in primary and 
lower secondary school, while in China students take up to nine standardized subject tests each year 
(Schmidt, Houang, & Shakrani, 2009). 


</) 

o 

o 


o 

A 


C 

re 



c 

re 

Q_ 


< 

</) 


41 

oo 

re 


ai 

Cl. 


60.00% 

50.00% 

40.00% 

30.00% 

20 . 00 % 


10 . 00 % 


0 . 00 % 



2000 2003 2009 

Year 


■ Never 

■ 1-2 Times/Year 

■ >2/Year 


Figure 2. School participation in standardized external tests 


Turn Toward Testing for Accountability 

This global transformation is not limited to the number of tests but instead encapsulates a 
qualitative shift in testing characteristics and aims. “High-stakes exams” have a long history in many 
countries. Traditional high stakes exams focus on student knowledge with test outcomes 
determining student’s academic and career trajectory (Eckstein & Noah, 1993). Perhaps the best 
historical example of a high-stakes test comes from the Chinese Civil Service Exam. Originating in 
the third century B.C. and formally instituted during the T’ang dynasty (618-907AD), the Chinese 
Civil Service Exam was based on the Confucian belief that selection into the ruling class should be 
based on individual merit (Eckstein & Noah, 1993). The test was originally composed of six distinct 
examinations but was later narrowed down to one exam, the chin-shih examination, which remained 
in place until 1904. By the Ming period (1368-1644) the exam was seen as the only legitimate route 
to government positions as it was “more objective and less open to particularistic influence than the 
recommendatory system” (Ho, 1964, p. 15). Additionally high-stakes testing for students has been in 
place in countries throughout Europe, including Iceland, Portugal, and the United Kingdom (U.K.) 
since at least the mid-1940s (Eurydice, 2009a). 
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While much of the early literature muddles high stakes testing by combining effects on 
students and teachers into one general category (see Chapman & Snyder, 2000; Pari & McEvoy, 
2000), the current transformation toward testing for accountability shifts the high stakes from 
students to educators. Teasing out tests that place high stakes on students from those that place high 
stakes on educators is important because they provide a different set of motivations and are likely to 
lead to diverse student outcomes (Harris & Herrington, 2006). Identification of the dominant 
features of a test, including where the test is administered and to whom the test results focused on, 
can illuminate these differences. Table 1 identifies tests that are administered within the K12 
stmcture that make educators responsible for their students test scores as testingfor accountability. 
Testing for accountability includes the application of formal or informal, positive or negative 
consequences on educators dependent on their students’ performance measures (Figlio & Loeb, 
2011). Educational systems that apply testing for accountability are interested in ensuring those that 
are delegated authority to educate children are “answerable to another level of authority for their 
prescribed responsibility” (Smeed & Victory, 2010, p. 28), explicitly answering the essential 
questions linked to accountability: accountability “to whom” and accountability “for what” (William, 
2010). The contentious nature of testing for accountability indicates that answers to these questions 
remain challenged (Dorn, 1998; Kornhaber, 2004a). 

Table 1 

Test Foci and Administration 

Focus on Students Focus on Teachers/Schools 

Administration Within K12 Graduation Exam Testing for Accountability 

School Stmcture 

Administration Outside College Entrance Exam Teacher Certification Exam 
K12 School Structure 


Tests typically associated with high-stakes testing are illustrated in the first column of table 1. 
Within the K-12 school structure these tests are compulsory for all students (Eurydice, 2009a). High 
stakes tests that determine the academic trajectory (via tracking) of a student or provide them with 
access to a subsequent education level can be identified as testingfor advancement. This differs from 
testingfor assessment, low stakes exams which focus on students but are designed to assess their 
academic progress and direct instruction. Testingfor accreditation is present when the aim of a test is to 
provide a credential, identifying an individual as a member of a distinct social group. Teacher 
certification exams and legal Bar exams are both examples of testing for accreditation. The 
categorization of testing aims is not mutually exclusive; for example, a high school graduation exam 
performs the function of both testing for advancement and testing for accreditation. 

Holding different actors responsible for student achievement scores has shifted the blame 
from low performing students to low performing schools (Apple, 1999). When test scores remain 
aggregated at the individual level, parents often feel a personal interest in improving the quality of 
education in the classroom, leading to a more collaborative relationship with the teacher as both the 
parent and teacher work together to improve student learning. In contrast, the aggregation of results 
to a classroom or school level allow parents to place blame squarely on the teacher/school, leading 
to a more conflictual relationship in which the community questions why the school isn’t 
maximizing the students learning while taking little responsibility on itself. This hostile relationship 
may become increasingly common as tests for advancement and tests for assessments are 
increasingly transformed into tests for accountability through school level aggregation. In some 
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countries this evolution is so great that testing has become synonymous with accountability (Froese- 
German, 2001), suggesting testing no longer has other aims. 

The ultimate outcome of testing for accountability reforms is largely shaped by educators 
who play a dual role as policy implementer and student influencer. Teachers, through their position 
in the classroom hierarchy and student’s perception of teacher as legitimate classroom authority, are 
in a position to significantly influence their student’s behavior and cultural understanding (Smith, 
2012). The educator position encompasses both autonomy and obligation. From this position 
educators have to prioritize often-competing demands while continuing practices that are in the best 
interest of their students (McLaughlin, 1991; Shulman, 1983). The perceived best interests of the 
student may include adapting classroom routines, structures, or instructional practices to individual 
student’s academic needs. Therefore, unlike student focused testing which affects one student at a 
time, testing for accountability affects a classroom full of students simultaneously by leveraging 
behavior change in educators. 

Educators as local policy enactors have been identified as “street level bureaucrats”- 
individuals that have the ultimate responsibility of policy implementation (Shulman, 1983). With 
autonomy, educators interpret or make sense of the policy, shaping how it is implemented as well as 
the resulting consequences (Rosen, 2009). Differences in the beliefs and experiences of educators 
can lead to policy enactment that is substantially different than the originally articulated policy. 
Although testing for accountability imposes on and restricts educators’ professional autonomy (Luna 
& Turner, 2001), external accountability measures have been successful in altering the in class and 
administrative decisions of educators (Booher-Jennings, 2005). The greatest concern for teachers is 
their personal and professional survival (Gilles, Cramer, & Hwang, 2001) and when faced with 
testing for accountability this survival instinct is heighted (Nicols & Berliner, 2007), leading 
“educators who feel oppressed by an ineffective and potentially harmful evaluation system [to] feel 
justified” (Paris & McEvoy, 2000, p. 150) in altering their behavior to maintain their livelihood. 
Given the autonomy and authority of educators, the linkage of student success with educator 
survival has the potential to create large scale changes in the academic quality and success of 
multiple students simultaneously; making testing for accountability policies substantially different 
from student focused testing. 

Early Adopters of Testing for Accountability 

The shift towards testing for accountability has been a relatively recent phenomenon, seen 
first in the United States and the United Kingdom. The movement towards testing for accountability 
in the U.S. is rooted in a national history that wants “both a system that rewards merit and a system 
that generates equality” (Dorn, 2007, p. xiv). Linking tests to student accountability in the U.S. 
started in the 1960s and by the end of the 1970s many states established a link between test scores 
and school accountability (Dorn, 2007). The 1970s was characterized by the entrenchment of human 
capital arguments for education as an investment (Becker, 1962; Schultz, 1961). When combined 
with the shift in school funding from primarily local sources to a mix of local and state control, this 
investment perspective created an atmosphere where “legislators wanted some quid pro quo for 
spending more on education” (Dorn, 2007, p. 6). The 1970s was dominated by minimum 
competency standardized tests which mirrored the decline in intelligence testing, the later due to 
civil rights challenges against using IQ for student placements and an increasing desire to dispel 
human potential as fixed (Kornhaber, 2004a). 

The 1980s saw the rise of the New Right in the U.S. and the U.K.. Beginning with the 
governments of President Reagan in the U.S. and Prime Minister Thatcher in the U.K. (Figlio & 
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Loeb, 2011), New Right calls for improved schooling were articulated “by national policymakers 
into an umbrella of neoliberal and neoconservative.. .reforms” (Carl, 1994, p. 315). This ideological 
shift was largely accepted by the public because of an increasing discontent and distmst of public 
administrators (Dorn, 2007), due in part to: (1) the economic recession of the 1970s which led to a 
decline in social services as the “crisis of the welfare state” led to calls for increased fiscal 
accountability (Hopmann, 2008), (2) an increasing anxiety about the state of U.S. schools and the 
ability of U.K. schools to equip students with the skills needed to excel in the economy (Fitz, 2003), 
(3) the inability of desegregation attempts to open up mobility paths for minority students while 
concurrently threatening the privileged position of the middle class and white parents, and (4) the 
alienation of working class and minority families from the traditional education experience (Carl, 

1994) . 

The neoliberal push in education from the New Right was dependent on their belief that 
private schools were more dynamic and innovative than the rigid and bureaucratic public schools, 
largely because they were situated within a market (Carl, 1994). New Right supporters used the work 
of Friedman (1955, 1962) and Chubb and Moe (1990) to justify their position that markets are the 
solution to a failing school system. Friedman (1962) believed the privatization of education would 
create a higher quality, more efficient product while Chubb and Moe (1990) suggested private 
schools were more efficient due to their organizational structure that provides principals with more 
autonomy and power. Forcing public schools to compete with private schools would, therefore, 
result in either an improvement in the product or a closure of poor performing schools. The 
neoliberal emphasis results in the promotion of individual responsibility amongst self-interested 
actors, effectively removing any potential societal blame (Hursh, 2007). 

The neoconservative call for uniformity and standardization complimented the neoliberal 
push for market-based accountability. Concerned with the relative permissiveness of the 1960s and 
1970s, neoconservatives identified the school system as one in crisis (Carl, 1994). To remedy the 
crisis neoconservatives believed the education system must create uniformity in classroom 
curriculum and increase enforcement mechanisms (Hill, 2006). In the 1980s, instead of reflecting 
nostalgically on the past, neoconservatives pressed for “the development of coherent prescriptions 
for change - usually by hitching the neoconservative cart to the neoliberal horse” (Carl, 1994, p. 
300). 

New Right Policy Shifts in the U.S. 

The racial achievement gap in the U.S. was part of a publicly perceived crisis, increasing 
concerns about the state of schooling in America during the 1970s and 1980s (Tyack & Cuban, 

1995) . Measuring the achievement gap required a shift in funding and reporting from inputs of 
education to outputs - typically measured in test scores (Hanushek & Raymond, 2004; Supovitz, 
2009). The failing of the American school system and its inability to reduce the achievement gap 
were captured in the 1983 report, A Nation at Risk (Hopmann, 2008). The report and supporting 
rhetoric of the New Right helped shape the problem of U.S. education by “implicitly or explicitly 
attributing responsibility to particular individuals, institutions, or conditions” (Rosen, 2009, p. 276). 
The association of a problem with a solution is more easily accepted by the public when it is put in 
simplistic terms and aligns with already established cultural beliefs (Rosen, 2009). During the 1980s, 
the simple problem was poor performing schools, shifting the target of accountability from the 
individual to the schools (Lee, 2008). This “distinct change in direction and philosophy” (Harris & 
Herrington, 2006, p. 227) resulted in an enormous increase in the number of states with an 
accountability system from four in 1993 to forty in 2000 (Hanushek & Raymond, 2005). Testing for 
accountability was also linked to the modern school choice movement, a New Right push to put 
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increased pressure on schools through market based consumer choice. Comparative school level 
results, produced in testing for accountability systems, would help inform parental decision-making. 
The result was a significant increase in choice options in the U.S. in the 1980s and early 1990s 
(Eurydice, 2009a; Smith & Rowland, 2014). 

Neoliberal ideas have dominated education reform in the U.S. since A Nation at Risk (Hursh, 

2007) . In the 1980s, Texas became the first state to implement testing for accountability (Yarema, 
2010). This movement gained prominence nationally when President Bush and state governors met 
in 1989 to endorse the tying of student test scores to school performance. At approximately the 
same time the movement to standardize curriculum and instruction was gaining momentum 
(Hopmann, 2008). Standards were seen as essential for equity, ensuring that everyone was held to 
the same high expectations (Stotsky 2000). Testing for accountability was encouraged because 
aligning tests with higher standards would make the tests worth teaching to (Cohen & Ball, 1999; 
Spalding, 2000; Viadero, 1994), especially if the high standards measure important content (Koretz, 

2008) . The idea of national standards was embodied by President Clinton’s “Goals 2000” which was 
supported by the Educate America Act of 1994. Goals 2000 pushed for national standards and 
implemented voluntary national testing in grades 4, 8, and 12 (Carl, 1994). 

In 2001, the reauthorization of the Elementary and Secondary Education Act (ESEA), 
known as No Child Left Behind (NCLB), became the first national framework linking standards, 
assessment, and accountability (Datnow & Park, 2009). NCLB linked school performance with 
student scores on standardized examines and can be understood as “an evolution of previous 
attempts to use high-stakes tests to improve educational outcomes” (William, 2010, p. 110). Schools 
were judged on their ability to make adequate yearly progress (AYP) towards 100% student 
proficiency on achievement tests by 2014. Schools that failed to reach AYP for three consecutive 
years were subject to corrective action, including potential school closure (Springer, 2008). The 
emphasis on standardized tests was a boon for the testing industry who recorded a massive increase 
in test sales in the U.S. from $260 million annually in 1997 to $700 million annually in 2008, a lower 
bound estimate that does not take into account test support materials and services (Frontline, 2008). 

New Right Policy Shifts in the U.K. 

In a 1976 speech at Ruskin College Prime Minister Callaghan signaled a change in education 
policy toward the use of market mechanisms in the U.K.. Central to his speech was the notion that 
“the education system was not providing industry and the economy with what it required in terms of 
a skilled and well-educated workforce” (Furlong & Phillips, 2001, p. 6). The result of his speech was 
a shift in blame from larger societal issues, such as poverty and inequality, to ineffective schools 
(Hursh, 2005b). The subsequent Conservative government, led by Prime Minister Thatcher, made it 
clear that the individual had the responsibility to combat ineffective schools by making informed 
consumer choices that would pressure poor performing schools to change their practice. 

Influenced by New Right ideology, the 1988 Education Reform Act and the 1992 Schools 
Act established national testing based on a national curriculum for ages 7, 11 and 14, and required 
local education authorities to produce school level comparable examination results, known as league 
tables (Edwards & Whitty, 1992; Teelken, 1999; West & Pennell, 2000). The 1988 Education Act 
was a simplified version of policy suggestions laid out by the Task Group of Assessment and 
Testing (TGAT), an expert group of practitioners and policy researchers that proposed testing for 
instmctional and diagnostic purposes as well as systems evaluation and accountability. However, 
concerned that the TGAT brief was a subversion of left-wing educators, the TGAT brief was 
dismantled leaving policies that focus primarily on evaluating the system, schools, and educators 
(James, 2011). The 1988 act had at least four substantial effects on education policy in the U.K.. 
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First, national curriculum was created and designed to occupy 70% of schools instructional time 
(Hursh, 2005b). Second, standardized tests were established at four Key Stages with the publication 
of results through league tables providing parents with the information necessary to make an 
informed consumer choice. National curriculum and standardized tests were necessary in order for 
decentralize fiscal decision making to the school level (Edwards, 2001). Third, open enrolment was 
established where students could enroll of the school of their choice, given space was available. 
However, space is rarely available in high performing schools. Additionally, open enrolment was 
linked per pupil funding, requiring schools to compete to ensure they have adequate school 
enrollment (Edwards & Whitty, 1992; Fitz, 2003). Finally, the power of Local Education Authorities 
(LEAs) was limited as curricular and pedagogic control was taken by the national government and 
fiscal decision-making was devolved to the school level. This resulted in weaker LEAs that 
essentially became the deliverer of national level policy (Fitz, 2003). 

The entrenchment of testing for accountability continued in the 1990s. In 1991, the Parent 
Charter enhanced the choice environment by emphasizing the rights of parents as active choosers in 
their child’s education, providing means for parents to evaluate the schools based on league tables 
and relocate their children if necessary to higher performing schools and establishment of Office for 
Standards in Education (Ofsted) in 1992 increased presence of accountability in education (James, 
2011). The Labour Party further strengthened the national curriculum by specifying teaching 
methods in math and literacy (Hursh, 2005b). Somewhat surprisingly, it “not only been keenly 
committed to using the available levers created by the Conservatives, but ... added a raft of its own, 
to maintain pressure on schools to improve levels of attainment” (Fitz, 2003, p. 234). The Blair 
government of the late 1990s and early 2000s supported policies that prompted between school 
competition (DiGaetano, 2014). In an attempt to improve education by weeding out schools that 
unable to compete, the 1998 School Standards and Framework Act identified special measures that 
would be taken if a school failed inspection. Those schools that did not show improvement would 
be shut down (DiGaetano, 2014). 

Global Transformation toward Testing for Accountability 

Although there are some signs that early adopters (i.e. the U.K. and U.S.) are taking marginal 
steps away from holding schools accountable this has not slowed the global transformation toward 
testing for accountability. In the U.K., regional autonomy has led Scotland to scrap its testing 
program (Volante, 2007) and England to eliminate tests for 14 year olds in 2008 (Eurydice, 2009a), 
however league tables and between school competition still dominate U.K. education policy. In the 
U.S., congress has failed to reauthorize ESEA and provide an alternative to NCLB. With congress 
deadlocked, President Obama pushed through his seminal policy, Race to the Top, and provided 
waivers to states to circumvent some of the requirements put forth by NCLB, namely the arbitrary 
2014 deadline for 100% proficiency. Both policies, however, continue the NCLB emphasis of basing 
school evaluations on comparable school level data which is available to the public and tying test 
scores to teacher livelihood through the implementation of “pay for performance” schemes (Dillon, 
2011; McNeil & Klein, 2011; Smith & Rowland, 2014). 

Regardless of perceived steps away from testing for accountability by the U.S. and U.K., the 
global expansion continues full speed as countries follow the early examples of the U.S. and U.K. 
and engage in the “ubiquitous adoption of accountability policies” (Hanushek & Raymond, 2004, p. 
407). Testing for accountability is viewed as a common solution to education problems around the 
world, and an important part in the global education compact (Mundy, 2006) or “global education 
reform movement” (Sahlberg, 2010, p. 47), as illustrated in Butland (2008), Lemke et al. (2004), and 
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Figlio and Loeb (2011). Furthermore, accountability mechanisms that leave control to regional or 
national authorities are substantially similar across countries (Macnab, 2004). The adoption to testing 
for accountability by diverse countries around the world (see section 7 below for examples) suggests 
that “the development and implementation of accountability systems has been one of the most 
powerful, perhaps the most powerful, trend in educational policy in the last 20 years” (Volante, 

2007, p. 4). 

In many countries donor agencies play a significant role in shaping national policy. 
Increasingly multilateral organizations and international finance institutions reinforce testing for 
accountability by linking loan conditions to assessment infrastructure and policy (Kamens & 
McNeely, 2010). Of note is the World Bank’s movement toward supporting a testing culture. In 
their content analysis of the World Bank’s Education Sector Strategy 2020, released in 2011, Joshi & 
Smith (2012) find a nearly 100% increase in terms associated with the testing culture from the prior 
1999 Sector Strategy paper. This included an astonishing increase in mentions of “accountability” 
from twice in the 1999 strategy to 32 times in the latest release. This led the authors to question 
whether a World Bank focus on “test-based education may be crowding out emphasis on well- 
rounded skills, personality development, and critical thinking that might come from a more 
problem-solving or dialogic approach to education” (p. 192). The recent establishment of the World 
Bank’s (SABER) tool provides additional evidence of a global trend towards increased 
accountability. SABER is a voluntary tool for nations to evaluate the effectiveness of their education 
system. It is strongly encouraged by the World Bank and includes provisions for national and 
international assessments. Countries that do not implement a national testing policy that ensures 
accountability or fail to participate in international assessments, such as PISA, receive lower grades 
(Bruns, Filmer, & Patrinos, 2011). Although voluntary in nature, the normative pressure placed on 
national leaders by SABER pushes countries toward adopting testing for accountability policies. 

Explaining the Turn Toward Testing for Accountability 

Similar to the convergence of democratic and republican policy in the U.S. or conservative 
and labour policy in the U.K., the global education compact that emerged at the beginning of the 
21 st century was a compromise between traditionally neo-liberal institutions, such as the World Bank 
and International Monetary Fund, and the equity focused United Nations (Daun & Mundy, 2011; 
Mundy, 2006). The “ideal” governance laid out in the compact largely mirrors the priorities of the 
New Right; including a focus on decentralization, incorporating the private sector into education, 
and the use of standardized tests (Mundy, 2006; Rose, 2005). Solidified in the Dakar Forum on 
Education for All and the Millennium Development Project (Kitaev, 2004; Mundy, 2006), the 
practices and expectations outlined in the global education compact are part of a larger world 
culture. 

World Culture theory suggests that the global acceptance of testing for accountability can be 
understood as part of a larger cultural and collective process based around shared global values and 
ideas of legitimacy (Meyer, 1977). World Culture theory is one strand of a larger theoretical 
framework known as Neo-institutionalism, Sociological Institutionalism, or World Society theory 
(Schofer et al., 2012; Wiseman, Astiz & Baker, 2013). Neo-institutionalism sees “social action as 
deriving from culture, knowledge, and authority rooted in global institutions and structures” 

(Schofer et al., 2012, p. 57). Institutions, such as the family, religion, and education, help construct 
culture by expanding social roles and legitimating action and knowledge (Baker et al., 2006; Meyer, 
1977). For example, “modern educational systems formally reconstruct, reorganize, and expand the 
socially defined categories of personnel and of knowledge in society” (Meyer, 1977, p. 72). The 
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cultural products of institutions are therefore shaped by the institution, strengthening and 
reinforcing its authority (Baker et al., 2006). Institutions influence actors at the local, regional, 
national, and international level through molding the culture they are embedded in (Schofer et al., 
2012 ). 

Individual actors recognize what is appropriate behavior within a given culture by the 
normative scripts or cultural models associated with each social role (Schofer et al., 2012). Scripts 
guide actors, telling them how to feel or act in the world (Baker, 2014; Baker et al., 2006). As within 
any culture, deviation from the script is subject to public scrutiny. The socialization process through 
which institutions create socially accepted scripts and culturally enforced role compliance often 
result in the internalization of acceptable and logical ways to engage in the surrounding social 
environment (Baker et al., 2006). The result is behavior that is taken for granted or understood as 
common sense and, therefore, beyond question. Essentially, culture has shaped behavior by 
identifying some actors and actions as legitimate while dismissing others (Jepperson, 2002). 

Cultural institutions spur a process of cultural alignment across populations known as 
isomorphism (Wiseman, Astiz, & Baker, 2013). Education, as an institution, is an interesting 
example of isomorphism. From a neo-institutional perspective, the presence of similar education 
models globally are the result of shared meanings and values that identify appropriate rules and 
routines (Baker, 2014; Wiseman, Pilton, & Lowe, 2010). Schools and education systems then 
“become isomorphic with the institutional environment in order to achieve legitimacy and ensure 
their survival” (Booher-Jennings, 2005, p. 234). 

Components of the World Culture 

As an increasingly normative policy lever that provides legitimacy to countries that practice it 
and reconstructs the notion of education, teachers, and students, testing for accountability is one of 
the components of the dominant world culture. Embedded within the testing for accountability 
movement is the taken for granted assumption in neoliberalism and the power of competition to 
produce quality. Kamens & McNeely (2010) suggest that the growth in assessment internationally 
represents a move towards a world educational ideology that consists of unfailing faith in science as 
the path for legitimate knowledge and a belief that organizations can be managed to produce a 
desired outcome. As illustrated by policy practices in the World Bank, there is a growing 
international consensus that participation in international testing and the use of a national 
assessment system are essential in a legitimate education system. Additionally the number of 
countries involved in testing for accountability is likely to increase as more policymakers are “doing 
what is expected of them by their individual and institutional peers” (Wiseman, 2010, p. 2). 

Multiple elements of the emerging world culture provide justification for testing for 
accountability systems, including: the expansion of western models, an emphasis on academic 
intelligence, faith in science as the rational path to truth, and the decentralization of authority to the 
local level. 

Western Model. The world culture is western in origin, shaped through the western 
universities charter to produce legitimate knowledge and spread through the rapid expansion of 
educational attainment known as the education revolution (Baker, 2014; Ramirez, 2003). As an 
advanced schooled society, many American ideas of education can foreshadow global outcomes 
(Baker, 2014). Noticeably the western model preaches education for human development and 
education as a human right (Baker, 2014; Kamens & McNeely, 2010). Situated within this model is 
the increasing importance of education in later life outcomes. The position of schooling as both a 
private and public good encourages countries to implement mandatory policies requiring all children 
attend (Baker, 2014). Since all students have the ability to achieve, stratified schooling is deemed 
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unjust and decisions regarding equal rights to an education no longer center on access but quality. 
When combined with a western view of education as “a ‘technical’ science that can be studied, 
rationalized, and quantified” (Wiseman, 2010, p. 18), it is not surprising that the right to a high 
quality education leads to a push toward measurable indicators that can be used for accountability 
purposes (Kamens & McNeely, 2010). 

Academic Intelligence. The education revolution has reinforced the cognitive and 
scientific dimensions of legitimate knowledge (Baker, 2014). Understood collectively as academic 
knowledge, this legitimate knowledge emphasizes meta-cognitive skills and the value of empirical 
evidence (Wiseman, 2010). Subjects that encourage this type of knowledge, namely mathematics and 
science, are considered valuable as demonstrated by an increasing use of mathematics achievement 
scores in public comparisons of schools and countries (Baker, 2014). Science and mathematics, as 
central to academic intelligence, is reflected in the work of Kamens, Meyer, and Benavot (1996) who 
found these subjects were no longer restricted to specialist knowledge but were now available to and 
intended for all students. Additionally, less academic subjects, such as the visual arts, have been 
dismissed for subjects that produce academic intelligence (Baker, 2014). 

Faith in Science. Science production measured by the number of scientists, scientific 
publications, scientific training, and the number of countries with a national science program 
continues to increase globally (Baker, 2014). The swelling of science production is often called for by 
policymakers and practitioners who believe science to be an objective arbiter of truth (Rosen, 2009). 
Critics of the unquestionable faith in science recognize that “as long as the public maintains this 
irrefutable objectivity of statistics, a graph here and a chart there can leverage support for provincial 
reforms that could never survive nuanced deliberation” (Robertson, 1999, p. 715). Sciences taken for 
granted position increases the value placed on education that uses test scores to objectively and 
accurately measure student knowledge (Paris & McEvoy, 2000). When international test scores are 
reported their “seemingly authoritative measure of students’ skills and abilities” (Cohen & 
Rosenberg, 1977, p. 128) prompt nations to respond through the development of appropriate 
educational reform (Drori et al., 2003). 

Decentralization. Decentralization has become a widespread institutional model as “a 
significant set of nations have responded to the legitimizing global forces within a multinational 
economy and world institutional system by adopting decentralization” (Astiz, Wiseman, Baker, 2002, 
p. 86). In Europe the increasing devolution of responsibility to the local level has been met with 
increased curricular and evaluative control at the regional or national level (Eurydice, 2009a). Testing 
for accountability is likely in decentralized systems because external exams are required to ensure 
education quality across diverse communities, where local control often leads to information 
asymmetry (Woessman, 2004, 2007). From this perspective, “statistical accountability systems” are 
seen as “one way to resolve the dilemma between granting autonomy and authority to educators and 
keeping them under some political control” (Dorn, 2007, p. 13). 

Neoliberalism. Neoliberalism, as an economic system, has spread to nearly every country in 
the world (Friedman, 1999). Neoliberalism promotes private property, open markets, and free trade 
on the basis of three core assumptions: (1) consumers have access to accurate market information, 

(2) consumers act as self-interested profit maximizers, and (3) private provision and competition will 
be more efficient than public control (Harvey, 2005; Jolly, 2003). The diffusion of neoliberalism as a 
legitimate economic approach occurred once it emphasized the general benefits of competition, 
embraced the role of actor as central to global institutions, and was viewed as a legitimate system by 
international institutions (Schofer et al., 2012). 

As a dominant approach to education policymaking, neoliberal practices are often adopted 
by countries seeking legitimacy (Wiseman, 2010). The implementation of neoliberal policy reinforces 
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cultural faith in its promise of increased performance with improved efficiency, solidifying the 
position of neoliberalism as a “common sense” approach that cannot be questioned (Apple, 1999; 
Hursh, 2007; Rosen, 2009). Treating education as a market invites the invisible hand of the market 
to improve the quality of schools through between school competition for students (Apple, 1999; 
Levin, 1992). For neoliberal policymakers academic gains must be balanced with financial costs 
(Wiseman, 2010). Public investment in education must be justified (Smeed & Victory, 2010): “the 
public has a right to expect that its resources are being used responsibly and that the public 
institutions are accountable for caretaking the public trust” (Supovitz, 2009, p. 215). Test results, 
therefore, provide an easily measured indicator of quality to ensure the public investment in 
education is being used efficiently and effectively. 

Test Results as Information. The publication of standardized test results provides parents, 
acting in the role of consumers, with easily interpretable market information allowing them to put 
more pressure on schools to respond to consumer demand (Ball, 1993; Woessman, 2007). The 
ability of parents to act rationally as self-interested consumers is a prerequisite to an effective market 
(Apple, 1999). The use of test results as information helps overcome the principal-agent problem, 
recognized by many neoliberal economists (Figlio & Loeb, 2011). In situations where a principal (i.e. 
parents) hire an agent (i.e. educators) to perform a service, if the interests of the principal and the 
agent do not align, the service may not be performed efficiently. Student test scores provide a 
common metric of measurement for evaluation, informing the principal of the agents real 
performance and providing evidence to ensure that the needs of the principal are being met 
(Woessman, 2004). Test results can also be conceptualized as quality indicators, which can help 
direct resources (Joshi & Smith, 2012) and aid the government in targeting inefficient schools to be 
shut down (Lincove, 2009). The use of test scores in this manner is acceptable as they are often 
considered the only legitimate measure of quality in education. With this authority, the 
implementation of testing is increasingly seen as an end in education, instead of a means to improve 
student understanding (Booher-Jennings, 2005). 

Test scores are used to spur parent involvement in the education process (Smith & Rowland, 
2014). The U.K. used their league tables to employ parents to make market choices, with the 
government telling parents that “you should get all the information you need to keep track of your 
child’s progress, to find out how the school is being run, and to compare all schools” (Department 
of Education, 1994, p. 3). While there is some evidence to support the use of this information in the 
parent’s decision-making regarding their child’s school attendance (Teelken, 1999), others find 
parents pay little attention to the publication of data (de Wolf & Janssens, 2007). Janssens and 
Visscher (2004) suggest that parents pay little attention to this information because they do not have 
access to the information, they lack the capacity to understand the information, they have limited 
real choice between schools, and the information does not reflect the factors that parents base their 
decision on. 


National Testing Policy Categories 

As the expansion of world culture legitimates and institutes testing for accountability as a 
taken for granted education policy it is important to remember that variation in national policy 
remains. Examining national policy is important because “national ministries of education typically 
act as agents imposing this activity [testing] on schools and education systems” (Kamens & 
McNeely, 2010, p. 6). Differences in national testing policy can best be seen on a rough continuum 
based on the presence and intensity of testing for accountability (see Table 2). This categorization 
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strategy is similar to the scheme of Eurydice (2009a) and the stated purposes of education 
policymakers often transcend policy categories (Eurydice, 2009a). 


Table 2 

National Testing Polity Categories 


Testing for 
Assessment 


Testing for 
Assessment, 
Advancement, or 
Accredidation 


Formative 

The use of national or 
regional examinations 
as diagnostic tools that 
are used internally by 
schools to inform 
instruction. 


Summative 

The use of national or 
regional examinations as 
a tool that summarizes 
student learning and is 
shared with parents; 
when disseminated is 
done so at the national 
or regional level. 


Testing for 
Accountability 


Evaluative 

The use of national or 
regional examinations 
as a tool that 
summarizes student 
learning and is 
disseminated at the 
school level to allow for 
between school 
comparisons. 



The use of national or 
regional examinations 
as a tool that 
summarizes student 
learning, is 
disseminated at the 
school level, with 
school/class level 
results used to apply 
rewards or sanctions. 


At the far left of the continuum are formative testing policies which use tests for assessing the 
progress of students. In this system, tests are ongoing, informing teacher instruction through direct 
feedback (Eurydice, 2009a). Formative policies are often professed by policymakers, however, more 
often than not this gesture is simply policy rhetoric (Irons & Harris, 2006). Summative testing policies 
use tests to assess, provide accreditation, or direct student’s educational advancement. They 
summarize how well an individual is doing at a given point in time (Eurydice, 2009a; Nitko & 
Brookhart, 2006) and when scores are disseminated they are done so at the national or regional level. 
Traditional high-stakes student tests that do not aggregate results at the school level and tests that 
disaggregate scores by ethic, economic, or regional subgroup, but not school, fall into this policy 
category. 

Evaluative and punitive testing policies both apply testing for accountability. In Evaluative 
testing policies test scores aggregated at the school level are used by the public to compare schools and 
evaluate school quality. In this economically based model, “responsibility is devolved to the 
individual consumer and the aggregate of consumer choices provides the discipline, of accountability 
and demand, that the producer cannot escape from” (Ball, 1995, p. 69). Informal consequences in 
this system result from a consumer choice market mechanism and public stigmatization through 
“naming and shaming” (de Wolf & Janssens, 2007) or the “scarlet letter” effect (Harris & 
Herrington, 2006). 

In Punitive testing policies school level aggregate data is used to apply consequences through the 
application of formal rewards or sanctions. The primary difference between evaluative and punitive 
testing policies can be found in the consequences. Implicit consequences through public pressure 
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characterize evaluative systems while explicit consequences through formal channels characterize 
punitive systems. Punitive systems function through a behaviorist model, which suggests individual 
action can be molded through incentives (Hanushek & Raymond, 2004). Punitive testing policy 
responds to reformers that believe serious consequences are necessary to transform the education 
system and has both compliance and avoidance costs, making it expensive to implement 
(McDonnell & Elmore, 1987). 

Examples of National Turns toward Testing for Accountability 

To illustrate the turn toward testing for accountability the following section explores three 
heterogeneous examples of countries that have transitioned toward a punitive testing policy. The 
three examples of Hungary, Mexico, and South Korea indicate that the transition toward testing for 
accountability is not limited by a country’s economic status or regional position. As illustrated in 
figure 3 the movement toward a punitive policy is not dependent on the relative achievement of a 
nation’s education system. Similar to table 2, figure 3 uses yellow to indicate an evaluative testing 
policy and red to designate a punitive testing policy. Looking across mathematics test scores across 
four rounds of PISA (2000, 2003, 2006, and 2009), no evidence of increasing achievement is 
observed as countries turn toward more intense measures of accountability. 
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Figure 3. Country mean mathematics scores for South Korea, the United States, Hungary, and 
Mexico as measured by the 2000, 2003, 2006, and 2009 

Source: PISA 

Hungary 

Hungary is a country that has traditionally been without a national testing system but has 
seen dramatic changes over the past two decades (Eurydice, 2009b). After spending 33 years as part 
of the Soviet bloc, education policies in the 1980s and 1990s provided a lot of freedom, if not a lot 
of guidance. The 1985 Education Act abolished the previous inspection system but failed to replace 
it with a viable way to measure education quality. The rapid democratization and decentralization of 
Hungary following the collapse of the Soviet Union led it to have one of the most decentralized 
education systems in Europe (Eurydice, 2009a, 2009b). During the 1990s, in a shift that was partially 
fueled by national performance on international assessments, it became clear that without a national 
test and with dissention in defining education quality assessment reform was needed. Starting in 
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2001, ‘monitoring surveys’ were implemented yearly and the National Assessment of Basic 
Competencies (NABC) was established (Eurydice, 2009a, 2009b). Originally testing 5 th and 9 th 
graders, the NABC quickly expanded to include 4 th , 6 th , and 8 th grade. The early goal of the test was 
to develop a within school evaluation culture and allow schools to compare their results to nationally 
aggregated sub-groups. However, since 2006 results have been disseminated to the general public, 
providing the school’s clients (parents) with information on school effectiveness (Eurydice, 2009a). 
This evaluative policy shifted quickly to a punitive policy when in 2008 schools were mandated to 
incorporate test scores into internal quality reports. Low achieving schools were then required to use 
this report to prepare and implement an action plan for remediation (Eurydice, 2009b). As illustrated 
in Figure 3, during this period of rapid transition toward a punitive testing system Hungary saw a 
less than 0.5% improvement in PISA mathematics scores. 

Mexico 

Since the passage of the National Agreement for the Modernization of Basic Education in 
1992 Mexico has applied a punitive national testing policy by tying student test scores to explicit 
rewards for teachers. Interestingly, this policy, usually opposed by teacher unions was adamantly 
support by the Mexican teachers union. Hecock (2014) suggests that this support was partially due 
to the union’s ability to use their position as an entrenched powerful institution, at a time when 
democratization in the country was in its infancy, to co-opt the policy. Specifically the union was 
able to ensure that once a performance raise was given it could not be rescinded and that 
administrators would be subject to a separate merit pay system. 

The pay for performance policy, known as Carrera Magisterial (CM), links the results of the 
Instrument for Testing New Secondary School Pupils (IDANIS) to teacher bonuses (Ferrer, 2006). 
Teachers volunteer to participate in CM, which provides rewards of up to 300% of their base wage 
(Hecock, 2014). Teachers are evaluated on a one hundred-point scale, covering a myriad of 
outcomes. Over time student test scores have played a larger role in teacher evaluation; points 
associated with student test scores increased from 20 between 1999 and 2011 to 50 today (Hecock, 
2014). Student test scores are also publicly available. After initial differences due to capacity at the 
regional level, by 2000 nearly all of Mexico participated in publishing school level data (Ferrer, 2006; 
Hagerstrom, 2006). Additional regional variation is present in the concentration of high-performing 
teachers, with the segregation of high-performing teachers in more advantaged communities 
exasperating between school differences (Luschei, 2012). 

In 2001 the quality schools program (PEC) was established as a voluntary program in which 
schools could apply for a competitive grant by submitting a five-year improvement plan. Increasing 
in popularity, there are concerns about the financial feasibility of the program (Hagerstrom, 2006). 
National mean scores in mathematics over the past decade have increased by roughly 8% (see figure 
3), however, considering the early establishment of a punitive system more information is needed to 
see whether the improvement can be partially attributed to the systems presence, duration, or other 
unrelated factors. 

South Korea 

In 1991 South Korea decentralized their education system. Since that time the Korean 
Institute for Curriculum and Evaluation (KICE) has been responsible for the administration of 
national assessments. The National Assessment of Educational Achievement (NAEA) is a criterion 
reference test administered in the 6 th , 9 th , and 10 th grade focusing on Korean, mathematics, science, 
social studies, and English. Initially the NAEA was a sample-based test with results aggregated and 
disseminated at the national level (Chung, 2014). In 2007, plans were unveiled to move to a census 
test and publish results at various levels, including the school. President Lee Myung-bak declared 
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that moving to an evaluative policy was essential for school choice to work. Teachers and teacher 
unions opposed the move on the grounds that Korea was already a high scoring country and 
national policy should be focusing on “creative” skills, and as a result school level publication was 
delayed until 2011 (Schmidt, Houang, & Shakrani, 2009). In 2011, the government began to target 
low performing schools known as ‘creative management schools that pursue academic ability 
enhancement’ (Kim et al., 2010) and “similar to the provisions in the No Child Left Behind Act in 
the U.S., Korea’s core plan is to provide additional support to schools with lots of children who are 
underperforming, but only for a specific period of time” (Schmidt, Houang, & Shakrani, 2009, p. 

56). The transition into a punitive policy, however, slowed in 2013 as new president, Park Geun-hye, 
felt pressure from the public and teachers and eliminated tests at the elementary school level and cut 
back middle school assessments (Chea, 2014). During the period Korean leadership pushed toward 
more intense forms of testing for accountability national mathematics scores on PISA saw 
essentially no change. However, in the years directly before and after the President Myung-bak’s 
announced change in school test score reporting, the national mean score dropped by approximately 
2% (see figure 3). 


Toward a Normative Testing Culture? 

With more countries transitioning toward testing for accountability systems it is important to 
move beyond describing testing as part of the world culture to investigate the substantive outcomes 
of this global trend (Schofer et al., 2012). World Culture theory is often criticized for its lack of 
attention to power and its tendency to describe the components and breadth of the culture without 
explicitly addressing the potential consequences. For example, Carney, Rappleye, and Silova (2012) 
state that World Culture theory undertheorizes aspects of agency and power and suggest that 
through recognizing practices such as shadow education and decentralization as part of the larger 
world culture. World Culture theorists are implicitly endorsing such practices as efficient and 
effective. While one can argue that origins and outcomes are not central in a theory focused on 
describing commonalities across heterogeneous environments, an important next step, once 
legitimate practices embedded in the world culture are identified, is the examination of potential 
benefactors of this world culture. As testing for accountability expands globally it is important for 
researchers and policymakers to take this next step and explore potential positive and negative 
consequences of tying students test scores to educator livelihood. 

Past meta-analyses identify a positive effect of accountability on student test scores ranging 
from a marginal effect size of 0.10 (Belfield & Levin, 2002) to a medium effect size of 0.55 (Phelps, 
2012) 1 . However, these studies were dominated by U.S. examples and did not distinguish between 
high stakes testing for advancement and educator focused testing for accountability. Additionally, 
the type of testing for accountability applied can lead to divergent results. Studies suggest that 
evaluative policies, in which schools are compared through the publication of results, has a positive 
effect on student test scores, although the “practical significance of this gain is negligible” (Springer, 
2008, p. 5). Additionally, in punitive systems, where explicit consequences are present, student 
achievement is higher (Dee & Jacob, 2009). When evaluative and punitive systems are compared, the 
relative advantage of punitive policy outweighs the market pressure of evaluative systems (Bishop et 
al., 2001; Hanushek & Raymond, 2005). However, a recent study in which participants from the 
2009 PISA were categorized into the four national testing policy categories described above found 
no difference between student math achievement in Summative, Evaluative, and Punitive systems 


1 The effect size in Lee (2008) lies between these two studies at 0.27 
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once schools that use student achievement as a criterion in admission was accounted for (Smith, 
2014). 

Concurrent with the potential achievement gains from increased accountability are concerns 
that educators game the system to ensure their survival and that testing for accountability systems 
and their corresponding school practices increase inequity. Past studies suggest that educators in 
testing for accountability systems are more likely to shape the testing pool by increasing student 
repetition (Hursh, 2005a; Kornhaber, 2004a, 2004b), reclassifying students into special education 
(Cullen & Reback, 2006; Haney, 2000; Jacob, 2005), or suspending low achieving students during 
testing periods (Figlio, 2006). Schools in testing for accountability systems are also more likely to 
narrow the curriculum by shifting their resources toward testing subjects and away from non-tested 
subjects (Rentner et al., 2006; Supovitz, 2009), participate in increased test practice (Au, 2007; 

Cuban, 2007; Nicols & Berliner, 2007; Yarema, 2010), including explicit direct instruction (Certo, 
2006), and focus instructional energy on those students closest to the desired proficiency level 
(Booher-Jennings, 2005; Gillborn & Youdell, 2000; Lipman, 2004; Reback, 2008). Finally, upsetting 
trends in equity are present as the effect of accountability on student achievement varies significantly 
by student ethnicity (Kornhaber, 2004a; McNeil, 2000; McNeil & Valenzuela, 2001; Wheelock, 
Bebell & Haney, 2000), native language (Monk, Sipple, & Killeen, 2001), and prior achievement 
(Jacob, 2001; West & Pennell, 2000). 

The global transformation toward testing for accountability outlined in this article should 
lead policymakers to question whether the potential benefits of implementing such a program 
outweigh the concerns, especially among the most marginalized student groups. Unfortunately rich 
discussions of this nature are less likely among national decision makers in the future, as testing for 
accountability is increasingly legitimated as a neoliberal script, which lays out appropriate action for 
nation-states within a world culture that emphasizes faith in science, academic knowledge, and 
western style education. Testing as the way to acquire valued knowledge is now taken for granted. 

As testing for accountability becomes a normative practice, engrained in the educational landscape 
of more and more countries, it has the potential to reconstruct education and how it is perceived by 
its actors and the general public. 

Testing for accountability, as a neoliberal policy, reinforces the reconceptualization of 
students, parents, and teachers, as products, consumers, and producers (Carl, 1994). With educator 
survival on the line, students are judged by their proclivity to pass a test. Remedial students, with 
long odds of passing the test, are considered a liability. The social construction of this student group 
as a “liability” happens early in schooling and may follow the path of similarly constmcted social 
categories that are now engrained in the rhetoric of education, “dropouts” and “at-risk” students 
(Baker, 2014; Fine, 1991; Swadener & Lubeck, 1995). In a schooled society, students that struggle in 
school will be identified as deviant and with education increasingly used as the only legitimate form 
of stratification, ‘liability’ students will be marred by that status for the rest of their life (Baker, 

2014). 

As a centerpiece of testing for accountability systems, between school competition shifts 
“schools modi operandi from those based on moral purpose towards those that emphasize 
productivity and efficiency” (Sahlberg, 2010, p. 48). Once productivity and efficiency are established 
as the essential aim of schooling, testing for accountability policies may be criticized but will rarely 
be abolished (Tyack & Cuban, 1995). Competition between educators sharply contrasts with more 
cooperative models that are important for healthy school climates and student and teacher 
motivation. In systems where scores are aggregated at the class level teachers may be concerned 
about peer judgment and being stigmatized as ‘bad’ teachers (Booher-Jennings, 2005). This 
tumultuous situation leads to educators blaming others, especially those in earlier grades, for not 
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adequately preparing students (Wiggins & Tymms, 2000). Internal feelings of anxiety and shame 
among teachers are exasperated by a belief that an emphasis on testing for accountability has a 
negative effect on public education, often stymieing their motivation (Certo, 2006; Jones & Egley, 
2006; Smeed & Victory, 2010). Additionally, the focus on test scores suggests that everyone is 
welcome in teaching as long as they can produce the test gains needed in an accountability system 
(Hopmann, 2008). This assumption of importance threatens the professional position of teachers, 
delegitimizing it as a profession that requires long-term pedagogical training. The importance of test 
scores as information assumes that parents engage with the data and use it to drive their children’s 
school placement. Parents that are uniformed, do not use evidence based decision-making, or do not 
participate in school choice will be shunned by society (Ball, 1993). Evidence of subjective 
evaluations can already be found in the literature. For example, Woessman (2004) suggests that 
parents ability and willingness to make use of available information is a measure of “how strongly 
parents care for their children’s progress” (p. 4). 

The long-term societal effect of treating education as a market, concerned primarily with 
private returns, is a reduction in public spending on social services, such as education and health 
(Hopmann, 2008). These reductions may be partially ameliorated by increased private support, 
accelerating the movement toward privatization. Evaluating quality in education is a challenging 
endeavor due to education’s multifaceted short term and long-term outcomes. Test results provide a 
simple, relatively easy to understand measure that can be used with investors who want to ensure 
their money is going towards a high quality product. Recent literature suggests this is the case, as the 
publication of results increase the amount of voluntary contributions a school receives (Figlio & 
Kenny, 2009). The publication of results can also shape society through residential segregation. 
Families use school level test scores to judge the quality of the school system leading to the 
establishment of more “highly desirable” neighborhoods and effecting real estate prices (Figlio & 
Lucas, 2004). 

Testing for accountability is so engrained in many countries that it is partially self- 
perpetuating (Dorn, 2007). The use of assessment data reinforces the testing culture (Baker & 
Wiseman, 2005) and the public view of testing for accountability as synonymous with high 
expectations makes it challenging for policymakers to alter established practices, whether or not they 
want to, in fear of constituents labeling them soft on education (Paris & McEvoy, 2000). If the 
potential benefits and concerns of testing for accountability are not critically evaluated this global 
transformation will lead to a testing culture that is internalized as normative and adopted as 
individual values. This shift in constitutional mind-sets has the ability to affect the whole of society 
as “deeply engrained ways of understanding the relationship between the public and its institutions” 
(Hopmann, 2008, p. 425) are altered. Recognizing the cultural diffusion of testing for accountability 
and evaluating it before its cultural entrenchment is essential for the world to avoid the challenges 
faced by early adopters. 
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